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ABSTRACT 

The gain, or difference, score is defined as the 
difference between the posttest score and the pretest score for an 
individual. Gain scores appear to be a natural measure of growth for 
education and the social sciences, but they contain two sources of 
measurement error, error in either the pretest or posttest scores, 
and cannot be considered perfectly reliable. This study assessed the 
effect that different combinations of pretest and posttest 
reliability coefficients, pretest and posttest standard deviations, 
and the correlation between the pretest and posttest scores have on: 
(1) raw gain score reliability coefficients; (2) residual gain score 
coefficients; (3) estimated true score reliability coefficients; and 
(4) the correlation between pretest scores and raw gain scores. 
Simulations indicated that the reliability coefficients of the 
pretest and posttest scores determined to a large extent whether raw 
gam score, residual gain score, and estimated true gain scores were 
reliable. The study demonstrated that, for standardized tests of 
educational research such as those commonly encountered in elementary 
and secondary schools, gain scores can be useful indicators of 
progress if the user is knowledgeable about the proper type of gain 
score to use and its correct interpretation. (Contains 6 tables and 
40 references.) (SLD) 
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Background 

The gain (or difference) score is defined as the difference between the 
posttest score and the pretest score for an individual. Gain scores appear to 
be a natural measure of growth for education and the social sciences. Willett 
(1988-1989) wrote that, "The very notion of learning ittplies growth and 
change" (p. 346). "Cheinge phenomena ... such as the acquisition of knowledge, 
reduction of anxiety, positive changes in self-concept, and increase in 
productivity of human interactions are most validly viewed within the concept 
of change" (Corder-Bolz, 1973, p. 959) . 

Unfortunately, gain scores contain two sources of measurement error, the 
error in the pretest scores and the error in the posttest scores. Assuming 
that pre- and posttest scores are equally reliable, the two sources of error 
result in gain scores that are ordinarily less reliable than either the pre- 
or posttest scores. As the reliability of gain scores decreases, their 
usefulness in education and the social sciences also decreases. 

"Unreliability places a question mark after the score and causes any judgment 
based on it to be tentative to some extent. The accuracy of prediction that 
is possible to achieve is limited by the reliability of the measure through 
which the performance is being manifested" (stcuiley, 1971, p. 358) . 

Historically, gain scores have been used for a variety of purposes. 

They have been used: 

1) To represent the gain or loss of some skill for a specific 
individual (Rogosa, Brandt, & Zimowski, 1982) ; 

2) As a dependent varicdDle in an experimental or quasi -experimental 
research design (Cronbach & Furby, 1970; Fortune & Hutson, 1984; 
Rogosa, Brandt, & Zimowski, 1982) ; 
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3) As a criterion variable in a correlational study or a linear 

regression formula used in eui attempt to predict future behavior 
(Cronbach & Furby, 1970; Fortune & Hutson, 1984; Rogosa, Brandt, & 
Zimowski, 1982) ; ^ 

4) To identify subjects for treatment or selection on the basis of 
their large or small gain scores (Cronbach & Furby, 1970; Fortune & 
Hutson, 1984) ; eUid, 

5) To represent a construct such defining "self-satisfaction" as the 
"gain between ratings of self cind ideal-self on an esteem scale" 
(Cronbach & Furby, 1970, p. 79) . 

Gain scores have both detractors and supporters with more of the former 
them the latter. Detractors recognize the intuitive appeal of gain scores, 
but assert that, "The fact that test scores are not perfectly reliable often 
makes this obvious procedure [the use of gain scores] produce absurd results" 
(Lord, 1956, p. 421). 

Detractors list five reasons why gain scores are not appropriate. 

First, the posttest score is the most appropriate dependent variable in any 
experimental research design (Campbell & Stcmley, 1966; Cronbach & Furby, 
1970; Feldt, 1958; Knapp, 1980; Linn & Slinde, 1977). The advantage of 
pretest scores is that they can be used as a blocking variable or as a 
covariate to increase the precision of the analysis (Feldt, 1958) . 

Second, the use of gain scores in quasi -experimental designs may not 
prevent the confoxmding that exists between gain scores and pretest scores 
(Cronbach & Furby, 1970; Fortune & Hutson, 1984; Kenny, 1975; Linn & Slinde, 
1977) . When random assignment is not possible, the question must be asked: 
Are the gains or losses measured by the dependent variable (gain scores) the 
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result of the treatment or pre--existing differences which existed before the 
treatment? (Kenny# 1975) . 

Third# it is well established that the correlation between a set of 
pretest scores and gain scores is ordinarily spuriously negative# even if 
there is no true correlation between the two variables (Bereiter# 1963#* 
Cronbach & Furby# 1970#- Diederich# 1956; Linn & Slinde# 1977; Lord# 1956# 

1958, 1963; Thomson# 1924; Thorndike# 1924, 1966; Traub, 1994) . Any group 
selected on the basis of their large raw gain scores will ordinarily contain 
an unusually large number of subjects with low pretest scores and an imusually 
small number of subjects with high pretest scores. 

Fourth# raw gain scores have low reliability as derived by the 
procedures of classical test theory (Bereiter# 1963; Fortune & Hutson# 1984; 
Linn Sc Slinde# 1977; and Lord# 1963) . Whenever the pre- and posttest 
reliability coefficients are equal and the pre- and posttest standard 
deviations are equal# raw gain score reliability coefficients are 
disappointingly low (Linn & Slinde# 1977) . 

Last# the pre- and posttest instruments may be measuring different 
constructs resulting in gain scores that are uninterpretable (Bereiter, 1963; 
Linn & Slinde# 1977; Lord# 1956# 1958; Traxib# 1994). For example, parallel 
measures of the construct# mathematical ability may be measuring subtraction 
skills in the pretest measure (i.e. in a standardized achievement test at the 
end of second grade) and multiplication skills in the posttest measure (i.e. 
in a parallel standardized achievement test at the end of third grade) ; the 
result being a gain score that defies explanation. Lord (1958) cautioned that 
if the subjects under study have changed, even identical pre- and posttest 
instruments may not be measuring the same construct. 
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Supporters of gain scores (e.g. Engelhart, 1967; Maxwell & Howard, 1981; 
Overall & Woodward, 1975; Richards, 1975; Rogosa & Willett, 1983; and 
Zimmerman & Williams, 1982a, 1982b) admit that gain scores may not be reliable 
under most conditions. They insist, however, that in certain situations gain 
scores can be reliable; their point is that researchers should not 
automatically dismiss the use of gain scores when planning their research. 
Zimmerman and Williams (1982a) state that, "Our arguments indicate that gain 
scores can be reliable and it would be premature to discard such measures in 
research" (p. 153) . 

Some psychometric icuis point out that the assuit^tion of equal pre- and 
posttest reliability coefficients auid equal pre- and posttest standard 
deviations made by the detractors of gain scores is probably not realistic in 
longitudinal studies in education and the social sciences (Feldt & Brennan, 
1989; Zimmerman & Williams, 1982a) . Of particular interest to this study, 
Zimmerman and Williams assert that when the reliability coefficient and the 
stcindard deviation of the posttest scores exceeds the reliability coefficient 
and the standard deviation of the pretest scores, respectively, raw gain 
scores can be reliable. 

Some critics of raw gain scores are less strongly opposed to- -or even 
support- -modified gain scores (Feldt & Brennan, 1989) . One modified gain 
score is the residual gain score estimate. The residual gain score estimate 
is the difference between the actual and the predicted performeince using a 
linear regression equation with the pretest score as the predictor variable. 

Residual gain scores have an advantage over raw gain scores in that they 
are not correlated with the pretest score (Willett, 1988-1989) . For this 
reason, residual gain scores may be useful to educators. One such use is the 
identification of individuals or schools where treatment (presumably, the 
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educational program) has resulted in achievement gains which are greater or 
less than reasonably could be expected (Cronbach a Furby, 1970; O'Connor, 

1972) . The generally accepted belief is that residual gain scores are 
slightly more reliable than raw gain scores, particularly when pre- and 
posttest reliability coefficients are equal and pre- and posttest standard 
deviations are equal (DuBois, 1957; Linn & Slinde, 1977; Manning & DuBois, 
1962; Veldman & Brophy, 1974) . 

A second alternative to the raw gain score approach is to estimate the 
true gain score. Whenever one talks about raw gain scores, the score of 
interest is the true gain score, that is, a gain score if there are no errors 
of measurement. One of the advcintages of estimated true gain scores for 
research concerning student achievement is that estimated true gain scores are 
positively correlated with pretest scores. The result is that groups 
identified on the basis of their large gain scores contain many subjects with 
high pretest scores. This reflects the fact- -not apparent when examining raw 
gain scores--that the brightest students tend to learn the most. When the 
pre- and posttest reliability coefficients are unequal and/or the pre- and 
posttest standard deviations are unequal, estimated true gain scores are 
considered more reliable than raw gain scores (Linn & Slinde, 1977) . 

Even though the prevailing wisdom among psychometricians appears to be 
that gain scores are unreliable and should be avoided as indicators of change, 
their use in evaluations and educational research is quite common. Zimmerman 
and Williams (1982a) have stated that, "Empirical studies are needed to 
determine how often and under what circumstances gain scores can be reliable 
in practical measurement situations" (p. 153) . 
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Objectives 

The factors affecting the relicd^ility of raw- gain scores and estimated 
true gain scores are the: 1) reliability coefficients of the pre- and posttest 
scores; 2) steindard deviations of the pretest auid posttest scores; and, 3) 
correlation between the pre- and posttest scores. The factors affecting the 
reliability coefficient of residual gain scores are: 1) the standard 
deviations of the pre- and posttest scores, auid, 2) the correlation between 
the pre- and posttest scores. The factors affecting the correlation between 
raw gain scores and pretest scores are the; 1) relicibility coefficients of the 
pre- and posttest scores; and, 2) the standard deviations of the pre- and 
posttest scores. 

The purpose of this study was to investigate the effect that different 
combinations of the pre- auid posttest reliability coefficients, pre- and 
posttest standard deviations and/or the correlation between the pre- and 
posttest scores have on: 1) raw gain score reliability coefficients; 2) 
residual gain score reliadsility coefficients, 3) estimated true gain score 
relicQsility coefficients, and, 4) the correlation between pretest scores and 
raw gain scores. The range of values for each of the three factors studied 
was limited, but included values appropriate to educational research 
concerning student achievement as measured by standardized achievement tests. 

Method 

This study began by identifying a reasoncd)le range of values for the 
pre- and posttest reliability coefficients, stcuidard deviations, and 
correlations between pre- and posttest scores to be used in this study. Of 
particular interest in this study were values of these statistics that would 
be of a magnitude commonly associated with those found on standardized 
measures of educational achievement used in elementary and secondary schools. 
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The following sections describe the values chosen for this study and details 
regarding the rationale for each. 

The pre- cuid posttest reliability coefficients used for this study 
ranged from .75 to .95 in increments of .05, because the selected values 
covered the range of values reported in the technical manuals for the 
California Achievement Test. Form E and Iowa Test of Basic Skills . There 
seems to be general agreement that in longitudinal studies using different 
pre- and posttest measures, pre- and posttest reliability coefficients of 
stability and equivalence should be used in the calculation of gain score 
reliability coefficients (Feldt & Brerman, 1989; O'Connor, 1972; Stanley, 

1967 . The California Achi evement Test. Form E & F Technical Manual 
(CTB/McGraw-Hill, 1987) reported that alternate form reliability coefficients 
for subtests in reading, arithmetic, and language arts, and for the total 
reading, total language, total arithmetic and total test battery for grades 3 
through 12 reuiged from .71 to .96. The Technical Manual (Psychological 
Corporation, 1993) reported that alternate form relicdDility coefficients for 
subtests in reading cuid for total reading, total mathematics, and total 
Icuiguage tests ranged from .79 to .90. 

Zimmerman and Williams (1982a, 1982b) algebraically mainipulated the 
s:andard formula for the reliability of raw gain scores introducing lambda, 
the ratio of the standard deviation of the pretest scores to the standard 
deviation of the posttest scores. Their purpose was to illustrate the effect 
that the ratio of the two standard deviations had on raw gain score 
reliability coefficients. Similarly, formulas for the reliability of 
estimated true gain scores and the correlation between pretest scores and raw 
gain scores can be algebraically manipulated so that the effect of lambda on 
estimated true gain score reliability coefficients and the correlation between 

b 
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pretest scores and raw gain scores can be studied. The standard deviations of 
the pre- and posttest scores does not effect the reliability of residual gain 
scores . 

To best illustrate the effect of lanibda, the lanibda values used in this 
study were .50, .67, .80, 1.00, 1.25, 1.50, and 2.00. The inverse of the 

lambda values of .50, .67, and .80 are the lambda values of 2.00, 1.50, cind 
1.25, respectively. Lambda values of 1.25, 1.50, and 2.00 were used to model 
situations when the standard deviation of the pretest scores was 25%, 50%, and 
100% greater than the value for the standard deviation of the posttest scores. 
Lambda values of .50, .67, and .80 were used to model situations when the 
standard deviation of the posttest scores was 100%, 50%, and 25% greater than 
the standard deviation of the pretest scores. 

Values selected for the correlation between the pre- and post test scores 
for this study were .50 to .90 in increments of .10 because the selected 
values covered a plausible range of values. Two studies have used similar 
values. Martin (1985) reported in a study on raw gain scores that 
correlations between the pre- and posttest scores for the Iowa Test of Basic 
Skills (Riverside,^ 1978) ranged from .54 to .94. Rachor and Cizek (1994) in a 
study on the reliability of raw gain scores reported that the correlation 
between the pre- and posttest scores for the California Achievement Test. Form 
E (CTB/McGraw-Hill, 1985) ranged from .49 to .93. 

There is a direct relationship between the values for the pre- and 
posttest reliability coefficients and the maximum possible value for the 
correlation between the pre- and posttest scores. The maximum value for the 
correlation between the pre- and posttest scores is less than or equal to the 
product of the square roots of the reliability coefficients of the pre- and 
posttest scores (Crocker & Algina, 1986) . Table 1 lists the highest possible 
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value for the correlation between the pre- cind posttest scores for all 
possible combinations of the pre- and posttest reliability coefficients which 
were used in this study. When raw gain, residual gain, and estimated true 
gain score reliad^ility coefficients, and the correlation between pretest 
scores auid raw gain scores were calculated, the values used for the 
correlation between the pre- and posttest scores were those values which 
varied across the reinge of values selected for this study and which were equal 
to or less than the highest possible value for the correlation between the 
pre- cuid posttest scores . 
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Procedures 

Microsoft Excel 5.0 (Microsoft Corporation, 1985-94) was used to 
calculate reliadDility coefficients for raw gain, residual gain, and estimated 
true gain scores, and the correlation between pretest scores and raw gain 
scores using all possible combinations of the range of values selected for 
this study. The formula used to calculate raw gain score reliability 
coefficients was the Zimmerman and Williams (1982a) formula: 



__ Pxx'^*^Pyy*^ ^Pxy 
X + X ' - 2p^y 
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The relicibility of the residual gain score was calculated using the Linn 
and Slinde (1977) formula; 



PjW' ~ 



P yy' P XV P XX' ) 



1-p: 



( 2 ) 



xy 



The relicQ^ility of estimated true gain scores was calculated using the 
Lord (1956, 1963) formula: 



2 _ Pgx pQy ^PGxPGyPxy 

l-p’x. 



(3) 



Lord supplied formulas for calculating the coefficients Pgx pgy 

used in Equation 3 (Lord, 1956, 1963). Lord's formulas can be algebraically 
manipulated to illustrate the effect of lambda on estimated true gain scores. 
The resulting formulas are: 



Pgx ~ 



V^Pxy-V^Pxx- 
yjpDD-^ ~^Pxy ^ ) 



(4) 
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Pay ~ 



yf^ p^, -yfXp^ 

^Poo.(x-2p^+r‘) 



(5) 



The standard formula for the correlation between the pretest scores and 
gain scores can be algebraically mauiipulated in a similar fashion so that the 
effect of lambda on the correlation between the pretest and gain scores can be 
demonstrated. The formula becomes: 



Pjgy — •sfX 



( 6 ) 



Tables 2 through 6 present raw gain score, residual gain score, and 
estimated true gain score reliability coefficients, ajid the correlation 
between the pretest scores auid raw gain scores when pretest reliability 
coefficients were .75, .80, .85, .90, auid .9;", respectively, and the posttest 
reliability coefficients, lambda, and the correlation between the pre- and • 
posttest scores varied across the limited range of values selected for this 
study. 



Insert Tables 2 through 6 About Here 
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Raw Gain Score Reliability Coefficients 

Raw gain score reliability coefficients increased as the pre- and/or 
posttest reliability coefficients increased. Regardless of the value of 
lambda or the correlation between the pre- and posttest scores, most raw gain 
score reliability coefficients were equal to or greater than .70 when one of 
the pre- or ppsttest reliability coefficients was at least .85 and the other 
was at least .90. 

When pre- and posttest reliability coefficients weve equal, increasing 
the pre- and posttest reliability coefficients by .05 resulted in increases in 
raw gain score relicd>ility coefficients ranging from .08 to .50. The median 
increase was .12 and the majority of the increases fell in the .08 to .25 
range. Regardless of the value for the pre- and posttest reliability 
coefficients, the increases in raw gain score reliability coefficients were 
relative consistent as pre- and posttest reliability coefficients increased by 
.05 while lambda and the correlation between the pre- and posttest scores were 
held constant. Increases in raw gain score reliability coefficients greater 
than .25 occurred when the pre- and posttest reliability coefficients 
increased from .85 to .90 or .90 to .95 and the correlation between the pre- 
and posttest scores was .80 or above. The large increases occurred because 
the raw gain score reliability coefficient for the score with the lower 
reliability coefficient was very low. 

When only one of the pre- or posttest reliability coefficients increased 
by .05, raw gain score reliability coefficients increased from .01 to .25 with 
the majority of the increases in the .02 to .11 range. Larger increases 
tended to occur as the correlation between the pre- and posttest scores 
approached maximum possible values, again, because the raw gain score 
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reliability coefficient for the score with the lower reliability coefficient 
was very low. 

Increasing the correlation between the pre- and posttest scores resulted 
in decreases in raw gain score reliability coefficients. When the correlation 
between the pre- and posttest scores increased .10, decreases in raw gain 
score reliability coefficients ranged from .02 to .38 with the vast majority 
of the decreases ranging from .03 to .18. For specific values for the pre- 
and posttest reliability coefficients and lambda, larger decreases in raw gain 
score reliability coefficients were associated with increases in the 
correlation between the pre- and posttest scores from the second highest value 
used in this study to the maximum possible correlation. The larger decreases 
occurred because the raw gain score reliability coefficient for the score with 
the higher correlation between the pre- and posttest score was very low. 

Larger decreases also tended to occur as lambda approached one and smaller 
decreases tended to occur when lambda approached the largest or smallest 
values of lambda selected for this study. Larger decreases were also 
associated with lower values for the pre- and posttest reliability 
coefficients . 

The effect of lambda on raw gain score reliability coefficients was 
dependent on the values for the pre- and posttest reliability coefficients. 
When pre- and posttest reliability coefficients were identical smaller values 
for raw gain score reliability coefficients occurred as lambda approached one; 
larger values for raw gain score reliability coefficients occurred as lambda 
diverged from one. When pretest reliability coefficients were larger than 
posttest reliability coefficients, raw gain score reliability coefficients 
increased as lambda increased from one; when the pretest reliability 

was .90 or above and lambda was 2.00, most raw gain score 
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reliability coefficients were .70 or larger. Conversely# when pretest 
reliability coefficients were smaller than posttest reliaoility coefficients# 
raw gain score reliability coefficients increased as lambda decreased from 
one; when the posttest reliability coefficients were .90 or above and lambda 
was .50# most raw gain score reliability coefficients were .70 or larger. 

p#:>gidual Gain Score Reli ability Coefficients 

Residual gain score reliability coefficients increased as the pre- 
and/or post test reliability coefficients increased. Most residual gain score 
reliability coefficients were at least .70 when the posttest reliability 
coefficient was at least .85 and the pretest reliability coefficient was at 

least .90. 

When the pre- and posttest reliability coefficients were equal# 
increasing the pre- and posttest reliability coefficients by .05# resulted in 
increases in residual gain score reliability coefficients ranging from .08 to 
.47; all but one of the increases were .23 or less. The median increase was 
.125. The increases in residual gain score reliability coefficients were 
similar for specific values for the correlation between the pre- and posttest 
scores; increasingly higher values for the correlation between the pre- and 
posttest scores were associated with larger increases in residual gain score 
reliability coefficients. 

Increases in residual gain score reliability coefficients were more 
dependent on the posttest reliability coefficient than the pretest reliability 
coefficient. When pretest reliability coefficients were held constant and 
posttest reliability coefficients were increased by .05# residual gain score 
reliability coefficients increases ranged from .06 to .26# with all but two of 
the increases less than . 15 ; The median increase was . 08 . Increases in 
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residual gain score reliability coefficients were relatively consistent for 
fixed values of the correlation between the pre- and posttest scores . 

When posttest reliability coefficients were held constant and pretest 
relied^ility coefficients were increased by .05, residual gain score 
reliability coefficient increases ranged from .01 to .21 with all but two of 
the increases were less than .10. The median increase was .03. Increases in 
residual gain score reliability coefficients were relatively consistent for 
fixed values of the correlation between the pre- and posttest scores. 

Residual gain score reliability coefficients decreased as the 
correlation between pre- auad posttest scores increased. Increasing the 
correlation between pre- auid posttest scores by .10 resulted in decreases in 
residual gain score relicdDility coefficients ranging from .03 to .37. Most 
decreases were in the .06 to .24 range. The median decrease was .11. All of 
the decreases above .18 were associated with the largest possible value for 
the correlation between the pre- and posttest scores. For fixed values of the 
correlation between the pre- and post test scores, differences in residual gain 
score reliability coefficients increased as the pre- or posttest reliability 
coefficients decreased. 

Estima ted True Gain Score Reliability Coefficients 

Estimated true gain score reliability coefficients increased as the pre- 
and/or posttest reliability coefficients increased. Regardless of the value 
for lambda and the correlation between the pre- and posttest scores most 
estimated true gain score reliability coefficients were equal to or greater 
thaui .70 when pre- and posttest reliability coefficients were at least .90. 
When pre- and posttest reliability coefficients were equal and lambda was 
equal to one, estimated true gain score reliability coefficients were 
identical to raw gain score reliability coefficients. 
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When the correlation between the pre- and posttest scores was at its 
maximum possible value, estimated true gain score reliability coefficients 
increased dramatically, for all values of lambda which were different from 
one. For example, when the pre- euid posttest reliad^ility coefficients and the 
correlation between the pre- euid posttest scores was .90, the estimated true 
gain score reliability coefficient was .81. However when the correlation 
between the pre- and posttest scores dropped to .895, .890, or .850, the 

estimated true gain score reliad^ility coefficient dropped to .69, .61, or .52, 

respectively. Since it is unlikely that the correlation between pre- and 
posttest scores would reach the exact maiximxim possible value, the high 
estimated true gain score reliad^ility coefficients have little practical 
value. Therefore, estimated true gain score reliad^ility coefficients, when 
the correlation between the pre- and posttest scores was at its maximum 
possible value, will be ignored in this analysis. 

When pre- and posttest reliability coefficients were equal, increasing 
the pre- and posttest reliad^ility coefficients by .05 resulted in differences 
in estimated true gain score reliability coefficients ranging from decreases 
of .45 to increases of .54. The median increase in estimated true gain score 
reliability coefficients was .08. Most increases were in the .06 to .11 
range. Larger increases in estimated true gain score reliability coefficients 
were associated with increasing values for the correlation between the pre- 
and posttest scores. All of the negative increases in estimated true gain 
score relicd^ility coefficients were associated with values for the pre- and 
posttest reliability coefficients and the correlation between the pre- and 
posttest scores that were identical, thus producing very high estimated true 
gain score reliability coefficients as explained in the previous paragraph 
Regardless of the value of the pre- and posttest reliad^ility coefficients, the 
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increases in estimated true gain score relicibility coefficients were 
relatively consistent as pre- and posttest reliability coefficients increased 
by .05 while lambda and the correlation between the pre- and posttest scores 
were held constant. 

When only one of the pre- or posttest reliability coefficients was 
increased .05, differences between estimated true gain score reliability 
coefficients ranged from a decrease of .12 to an increase of .24. The median 
difference was .05. Most differences xn estimated true gain score reliability 
coefficients ranged from - .02 to .10. Most decreases in estimated gain score 
relicdDility coefficients were associated with lambda values of .50, .67, 1.50, 

or 2.00. Most large increases in estimated true gain score reliability 
coefficients were associated with lambda values close to one or correlations 
between pre- and posttest scores that approached their maximum possible value. 

Increasing the correlation between the pre- and posttest scores resulted 
in differential effects in estimated true gain score reliability coefficients. 
Changes in estimated true gain score reliability coefficients ranged from a 
decrease of .23 to an increase of .13 when differences associated with the 
maximum possible value for the correlation between the pre- aind posttest 
scores were ignored. The median difference was -.05. Positive increases 
tended to occur when the correlation between the pre- and posttest scores 
approached their maximum possible value or when lambda values were less than 
or equal to .67 or more than or equal to 1.50. 

When pretest reliability coefficients were larger them posttest 
relicdDility coefficients, estimated true gain score reliability coefficients 
increased as lambda values increased from one. When pretest reliability 
coefficients were smaller than posttest reliability coefficients, estimated 
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true gain score reliability coefficients increased as lambda values decreased 
from one. 

rnrrftlation Between Pretest an d Raw Gain Scores 

The correlation between pretest scores and raw gain scores was primarily 
determined by the value of lambda. As lambda values decreased from .80 and 
the correlation between the pre- and posttest scores increased, the 
correlation between pretest scores and raw gain scores increased in value. 
Values for the correlation between pretest scores and raw gain scores were 
generally non-negative when lambda values were .67 or less. When lambda 
values were 1.00 or greater, the correlation between pretest scores and raw 
gain scores were negative across all values for the correlation between the 
pre- and posttest scores selected for this study. As lambda values increased 
from 1.25, maximum values for the correlation between pretest scores and raw 
gain scores were -.60. 

Raw Gain Score Reliability Coefficients and the Correlation Between Pre- an d 
Posttest Scores 

The two major psychometric objections to the use of raw gain scores, as 
noted earlier in this paper, are the low reliability of raw gain scores and 
the spurious negative correlation between pretest scores and raw gain scoires. 
When lambda values were .67 or less (i.e. when the posttest standard deviation 
was at least thirty- three percent larger than the pretest standard deviations) 

the correlation between raw gain scores and pretest scores were either 

■ 

positive values or small negative values. When lambda values were less than 
or equal to .67 and pre- and posttest reliability coefficients were at least 
.85, most raw gain score reliability coefficients were at least .70. 
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Comparison of Raw Gain^ Residual Gain^ and 
Estimated True Gain Score Reliability Coefficients and 
the Correlation Between Pre- and Posttest Scores 
In situations where lambda was equal to one^ residual gain score 
reliability coefficients are higher than raw gain or estimated true gain score 
reliability coefficients. Under these constraints, raw gain scores are not 
appropriate due to the negative correlation between raw gain and pretest 
scores. Residual gain scores would seem to be preferadDle under these 
constraints, particularly when pre- and posttest reliability coefficients were 
less than .90. At values at or above .90, the differences between residual 
gain and estimated true gain scores were minimal. In practice, this means 
that residual gain scores are most likely to be prefercible--when the pre- and 
posttest score distributions can be expected to have equal variability. 

Estimated true gain score reliability coefficients were higher than raw 
gain or estimated true gain score reliability coefficients \mder several 
circumstcuices. When the correlation between pre- and posttest scores was 
close to its maximum possible value, estimated true gain score reliability 
coefficients were always higher than raw gain or residual gain scores (other 
than the circumstance when lambda was equal to one) . 

When treatment markedly affect the stcuidard deviation of the posttest 
scores (and thus lambda), estimated true gain scores may also be preferable. 
When lambda values were .50 or 2.00, estimated true gain score reliability 
coefficients were higher than raw gain or residual gain score reliability 
coefficients regardless of the value for the pre- or posttest reliability 
coefficients or the correlation between the pre- and posttest scores. When 
lambda values were .67 or less or 1.50 or more, values for estimated true gain 
score reliability coefficients were quite consistent across all values for the 
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correlation between the pre- and posttest scores, while raw gain and residual 
gain score reliability coefficients were much more affected by the correlation 
between the pre- cind posttest scores; in most cases estimated true gain score 
reliability coefficients were higher thein raw gain or residual gain score 
reliability coefficients. Thus estimated true gain scores would seem to be 
preferable when the correlation between pre- and posttest scores would likely 
be quite high, such as in studies of a short duration or when little change in 
the rank of students on the dependent variable is imlikely to change or when 
the pre- or post test standard deviation is at least 33% larger than the post- 
er pretest standard deviation, respectively. 

In many cases, the differences between raw gain score, residual gain 
score cuid/or estimated true gain score reliability coefficients were minimal. 
Raw gain scores would seem to be preferable because of the laws of parsimony 
in circumstances where lambda is less than .80 cind the correlation between raw 
gain scores aind pretest scores is non-negative, . 

Summary 

The purpose of this study was to investigate the effect that different 
combinations of the pre- and posttest reliability coefficients, lambda, and/or 
the correlation between the pre- and posttest scores had on the reliability of 
raw gain, residual gain, and/or estimated true gain scores and the correlation 
between raw gain scores and pretest scores. The relied^ility coefficients of 
the pre- and post test scores determined to a large extent whether raw gain 
score, residual gain score, and estimated true gain scores were reliable. 

Some specific findings include: 

• Lambda values of one or greater thcun one were associated with 

negative correlations between pretest scores and raw gain scores, 
thus discouraging the use of raw gain scores in that situation. 
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• Residual gain scores are generally most relicdjle when lambda values 
were around one. 

• Estimated true gain scores seem to be preferred in situations when 
lambda values were more extreme or when the correlation between the 
pre- and posttest scores approached the maximum possible value. 

• The principle of parsimony argues for the use of raw gain scores when 
relicdDility coefficients of the three types of gain scores are 
similar and the correlation between the pretest scores and raw gain 
scores is close to zero or is positive. 

Schools are often evaluated on their cdDility to increase the achievement 
of their students. When communicating cdDOUt student progress and evaluating 
school programs, most principals and teachers talk about the gains of 
students. There is also a recognized need for relicd^le gain scores in quasi- 
experimental research and evaluation designs. In educational cuid social 
science research there are many situations where random selection of subjects 
for treatment and control groups is not possible. However, substantive 
questions remain about the effectiveness of particular treatments within these 
settings . 

This study has demonstrated that, for standardized tests of educational 
research such as those commonly encountered in elementary and secondary 
schools, gain scores can be useful indicators of progress, if the user is 
knowledg eable a bout the proper type of gain score to use and knowledgeable 
about its proper interpretation. This study has investigated raw gain scores, 
residual gain scores, and estimated true gain scores and has generated some 
guidance regarding the appropriate matching of these tools to the 
educational /measurement contexts in which their use is most likely to yield 
defensible, interpretable results. A fruitful line of inquiry for the future 






BEST COPY AVAiUBLE 



Reliability of Raw Gain 23 



may be investigation of the utility of these tools in other measurement 
contexts . 
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Table 1 

Maximum Values for the Correlation Between the Pre- and Posttest Scores 



Pre- and Posttest Reliability Coefficients 
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Table 2 

Raw Gain Score, Residual Gain Score, and Estimated True 
Gain Score Reliability Coefficients Assuming tL.y =-75 
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Table 3 

Raw Gain Score, Residual Gain Score, and Estimated True 
Gai n Score Reliability Coefficients Assuming 
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Table 6 

Raw Gain Score/ Residual Gain Score/ and Estimated True 
Gain Score Reliability Coefficients Assuming = 
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R - Raw Gain Score Reliability Coefficients 

Re - Residual Gain Score Reliability Coefficients 

Est “ Estimated True Gain Score Reliability Coefficients 

Pxg - Correlation between pretest scores and raw gain scores 

* - Cannot be calculated since Pxy exceeds maximum possible value 
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